CVPR论文选读：FOTS:Fast Oriented Text Spotting with a Unified Network

Original 罗灿杰 CSIG文档图像分析与识别专委会 2022-07-11

CVPR2018论文选读

场景文本的检测和识别是计算机视觉中非常重要的任务。现有的许多方法，将检测和识别的工作分开来单独完成。然而，对于场景文本来说，检测和识别有着许多的联系，例如在文字的特征上有很多共同之处，所以两个任务并不是孤立的。

商汤科技与中科院深圳先进院的这篇CVPR 2018录用论文，提出了一种端到端可训练的用于场景文本检测和识别的网络，实现了视觉信息和视觉计算的共享。该网络使用RoIRotate方法实现卷积层的特征共享，并使用识别的细粒度监督信息加强了网络的训练，在ICDAR 2013[1]，ICDAR2015[2]，ICDAR 2017 MLT[3]上取得更好的准确率。同时，该网络满足了实时性需求，能够达到22.6fps。

该网络的整体结构如下图所示：

该网络的共享卷积层部分为全卷积网络，检测支路根据共享的特征图预测文本框，识别支路通过RoIRotate操作共享特征图，并使用循环递归网络(RNN)和Connectionist Temporal Classification (CTC)[4]解码器识别出文本内容。

其中，共享卷积层的主体为ResNet-50[5]，受到FPN[6]的启发，使用如下图所示的结构：

在检测支路上，借鉴了EAST[7]的思想，可以预测出不同方向的文本框。优化目标函数分为分类Loss和回归Loss，回归Loss中包含了角度的回归。在训练上使用了OHEM[8]方法，用于困难样本的挖掘。

在特征共享方法上，是本文的一大创新点。本文提出的RoIRotate能够避免RRPN[9]的max-pooling带来的特征丢失情况，保持了RoI和特征的对应关系。

类似透视变换的做法，RoIRotate对特征图上的文本区域重新插值，得到新的长短可变的特征。文中也给出了详细的数学表达：

在识别支路上，使用了类似CRNN[10]的方法，即CNN-LSTM的结构，其中CNN借鉴了VGG[11]的设计思路。

识别支路的优化目标函数为：

整个网络的优化目标函数需要进行检测Loss与识别Loss的权衡：

网络的训练集使用了CVPR 2016年发布的Synth800k[12]，同时根据每个测试集的特点，加入了其对应的训练集。结果如下，几乎全部取得最高的准确率：

文章强调，通过检测和识别端到端的联合训练，可以去掉一些类似文本的背景干扰区域，取得更好的检测效果。

同时，通过参数和速度的详细对比，说明了端到端网络的存储量小、速度快的优点。

最后文章给出了一些图例，展示了该网络的检测和识别效果。

本文的最大亮点在于共享特征图的采样方式，这也是目前大部分场景文本检测识别端到端文章的研究热点之一。RoIRotate使用了更为灵活的插值方法，极大地保留了特征图的细节，同时有将局部特征图旋转至水平的作用，是一种先进的采样共享方法。这也启发我们，如何将图像处理算法，或者几何约束引入深度神经网络，去约束优化目标的解空间，是值得思考的一个方向。

参考文献

[1] D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i Bigorda, S. R. Mestre, J. Mas, D. F. Mota, J. A. Almazan, and L. P. de las Heras. Icdar 2013 robust reading competition. In Document Analysis and Recognition (ICDAR), 2013 12^thInternational Conference on, pages 1484–1493. IEEE, 2013.

[2] D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, et al. Icdar 2015 competition on robust reading. In Document Analysis and Recognition (ICDAR), 2015 13th International Conference on, pages 1156–1160. IEEE, 2015.

[3] Icdar 2017 robust reading competitions. http://rrc.cvc.uab.es/. Online; accessed 2018-5-13.

[4] A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning, pages 369–376. ACM, 2006.

[5] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.

[6] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie. Feature pyramid networks for object detection. arXiv preprint arXiv:1612.03144, 2016.

[7] X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, and J. Liang. East: An efficient and accurate scene text detector. arXiv preprint arXiv:1704.03155, 2017.

[8] A. Shrivastava, A. Gupta, and R. Girshick. Training regionbased object detectors with online hard example mining. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 761–769, 2016.

[9] J. Ma, W. Shao, H. Ye, L. Wang, H. Wang, Y. Zheng, and X. Xue. Arbitrary-oriented scene text detection via rotation proposals. arXiv preprint arXiv:1703.01086, 2017.

[10] B. Shi, X. Bai, and C. Yao. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence, 2016.

[11] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[12] A. Gupta, A. Vedaldi, and A. Zisserman. Synthetic data for text localisation in natural images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2315–2324, 2016.

【温馨提示】该文检测识别效果的视频Demo可从如下链接观看：http://www.iqiyi.com/w_19rv1nnfjh.html （请复制链接在浏览器中打开）

本推文作者简介：罗灿杰，华南理工大学在读博士生，主要研究方向为场景文字检测与识别、深度学习及其应用。

（长按识别上图二维码加关注）

没想到！只卖真货的山姆超市会有这个结果

噩耗传来！她的遗体被找到

王冕和崔阿扎睡过？天佑当面质问崔！阿哲带货推迟，阿哲解释推迟原因。

事关收入，赶紧确认！！！

万年县委书记毛奇案，又有新消息！

CVPR论文选读：FOTS:Fast Oriented Text Spotting with a Unified Network

您可能也对以下帖子感兴趣

没想到！只卖真货的山姆超市会有这个结果

噩耗传来！她的遗体被找到

王冕和崔阿扎睡过？天佑当面质问崔！阿哲带货推迟，阿哲解释推迟原因。

事 关 收 入 ，赶 紧 确 认 ！！！

万年县委书记毛奇案，又有新消息！

生成图片，分享到微信朋友圈

CVPR论文选读：FOTS:Fast Oriented Text Spotting with a Unified Network

您可能也对以下帖子感兴趣

事关收入，赶紧确认！！！